Anytime State-Based Solution Methods for Decision Processes with non-Markovian Rewards

نویسندگان

  • Sylvie Thiébaux
  • Froduald Kabanza
  • John K. Slaney
چکیده

A popular approach to solving a decision process with non-Markovian rewards (NMRDP) is to exploit a compact representation of the reward function to automatically translate the NMRDP into an equivalent Markov decision process (MDP) amenable to our favorite MDP solution method. The contribution of this paper is a representation of non-Markovian reward functions and a translation into MDP aimed at making the best possible use of state-based anytime algorithms as the solution method. By explicitly constructing and exploring only parts of the state space, these algorithms are able to trade computation time for policy quality, and have proven quite effective in dealing with large MDPs. Our representation extends future linear temporal logic (FLTL) to express rewards. Our translation has the effect of embedding modelchecking in the solution method. It results in an MDP of the minimal size achievable without stepping outside the anytime framework, and consequently in better policies by the deadline.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Model-Checking Approach to Decision-Theoretic Planning with Non-Markovian Rewards

A popular approach to solving a decision process with non-Markovian rewards (NMRDP) is to exploit a compact representation of the reward function to automatically translate the NMRDP into an equivalent Markov decision process (MDP) amenable to our favorite MDP solution method. The contribution of this paper is a representation of non-Markovian reward functions and a translation into MDP aimed a...

متن کامل

Properties of Planning with Non-Markovian Rewards

We examine technologies designed to solve decision processes with non-Markovian rewards (NMRDPs). More specifically, target decision processes exhibit Markovian dynamics, called grounded dynamics, and desirable behaviours are modelled as state trajectories specified in a temporal logic. Each technology operates by automatically translating NMRDPs into corresponding equivalent MDPs amenable to c...

متن کامل

Decision-Theoretic Planning with non-Markovian Rewards

A decision process in which rewards depend on history rather than merely on the current state is called a decision process with non-Markovian rewards (NMRDP). In decisiontheoretic planning, where many desirable behaviours are more naturally expressed as properties of execution sequences rather than as properties of states, NMRDPs form a more natural model than the commonly adopted fully Markovi...

متن کامل

Fahiem Bacchus

Markov decision processes (MDPs) are a very popular tool for decision theoretic planning (DTP), partly because of the welldeveloped, expressive theory that includes effective solution techniques. But the Markov assumption-that dynamics and rewards depend on the current state only, and not on historyis often inappropriate. This is especially true of rewards: we frequently wish to associate rewar...

متن کامل

Rewarding Behaviors

Markov decision processes (MDPs) are a very popular tool for decision theoretic planning (DTP), partly because of the welldeveloped, expressive theory that includes effective solution techniques. But the Markov assumption—that dynamics and rewards depend on the current state only, and not on history— is often inappropriate. This is especially true of rewards: we frequently wish to associate rew...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002